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Abstract 


Recent price-of-anarchy analyses of games of complete information suggest that 
coarse correlated equilibria, which characterize outcomes resulting from no-regret 
learning dynamics, have near-optimal welfare. This work provides two main tech¬ 
nical results that lift this conclusion to games of incomplete information, a.k.a., 
Bayesian games. First, near-optimal welfare in Bayesian games follows directly 
from the smoothness-based proof of near-optimal welfare in the same game when 
the private information is public. Second, no-regret learning dynamics converge 
to Bayesian coarse correlated equilibrium in these incomplete information games. 
These results are enabled by interpretation of a Bayesian game as a stochastic 
game of complete information. 


1 Introduction 

A recent confluence of results from game theory and learning theory gives a simple explanation for 
why good outcomes in large families of strategically-complex games can be expected. The advance 
comes from (a) a relaxation the classical notion of equilibrium in games to one that corresponds to 
the outcome attained when players’ behavior ensures asymptotic no-regret, e.g., via standard online 
learning algorithms such as weighted majority, and (b) an extension theorem that shows that the 
standard approach for bounding the quality of classical equilibria automatically implies the same 
bounds on the quality of no-regret equilibria. This paper generalizes these results from static games 
to Bayesian games, for example, auctions. 

Our motivation for considering learning outcomes in Bayesian games is the following. Many impor¬ 
tant games model repeated interactions between an uncertain set of participants. Sponsored search, 
and more generally, online ad-auction market places, are important examples of such games. Plat¬ 
forms are running millions of auctions, with each individual auction slightly different and of only 
very small value, but such market places have high enough volume to be the hnancial basis of large 
industries. This online auction environment is best modeled by a repeated Bayesian game: the auc¬ 
tion game is repeated over time, with the set of participants slightly different each time, depending 
on many factors from budgets of the players to subtle differences in the opportunities. 

A canonical example to which our methods apply is a single-item hrst-price auction with players’ 
values for the item drawn from a product distribution. In such an auction, players simultaneously 
submit sealed bids and the player with the highest bid wins and pays her bid. The utility of the 
winner is her value minus her bid; the utilities of the losers are zero. When the values are drawn from 
non-identical continuous distributions the Bayes-Nash equilibrium is given by a differential equation 
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that is not generally analytically tractable, cf. jS) (and generalizations of this model, computationally 
hard, see ||3l). Again, though their Bayes-Nash equilibria are complex, we show that good outcomes 
can be expected in these kinds of auctions. 

Our approach to proving that good equilibria can be expected in repeated Bayesian games is to 
extend an analogous result for static gamesO i-C-, the setting where the same game with the same 
payoffs and the same players is repeated. Nash equilibrium is the classical model of equilibrium for 
each stage of the static game. In such an equilibrium the strategies of players may be randomized; 
however, the randomizations of the players are independent. To measure the quality of outcomes in 
games Koutsoupias and Papadimitriou 0 introduced the price of anarchy, the ratio of the quality 
of the worst Nash equilibrium over a socially optimal solution. Price of anarchy results have been 
shown for large families of games, with a focus on those relevant for computer networks. Roughgar- 
den HD identified the canonical approach for bounding the price of anarchy of a game as showing 
that it satisfies a natural smoothness condition. 

There are two fundamental flaws with Nash equilibrium as a description of strategic behavior. First, 
computing a Nash equilibrium can be PPAD hard and, thus, neither should efficient algorithms for 
computing a Nash equilibrium be expected nor should any dynamics (of players with bounded com¬ 
putational capabilities) converge to a Nash equilibrium. Second, natural behavior tends to introduce 
correlations in strategies and therefore does not converge to Nash equilibrium even in the limit. 
Both of these issues can be resolved for large families of games. First, there are relaxations of Nash 
equilibrium which allow for correlation in the players’ strategies. Of these, this paper will focus 
on coarse correlated equilibrium which requires the expected payoff of a player for the correlated 
strategy be no worse than the expected payoff of any action at the player’s disposal. Second, it was 
proven by Blum et al. m that the (asymptotic) no-regret property of many online learning algorithms 
implies convergence to the set of coarse correlated equilibriaQ 

Blum et al. El extended the definition of theprice of anarchy to outcomes obtained when each 
player follows a no-regret learning algorithm^ As coarse correlated equilibrium generalize Nash 
equilibrium it could be that the worst case equilibrium under the former is worse than the latter. 
Roughgarden IfTTI . however, observed that there is often no degradation; specifically, the very same 
smoothness property that he identified as implying good welfare in Nash equilibrium also proves 
good welfare of coarse correlated equilibrium (equivalently: for outcomes from no-regret learners). 
Thus, for a large family of static games, we can expect strategic behavior to lead to good outcomes. 

This paper extends this theory to Bayesian games. Our contribution is two-fold: (i) We show an 
analog of the convergence of no-regret learning to coarse correlated equilibria in Bayesian games, 
which is of interest independently of our price of anarchy analysis; and (ii) we show that the coarse 
correlated equilibria of the Bayesian version of any smooth static game have good welfare. Com¬ 
bining these results, we conclude that no-regret learning in smooth Bayesian games achieves good 
welfare. 

These results are obtained as follows. It is possible to view a Bayesian game as a stochastic game, 
i.e., where the payoff structure is fixed but there is a random action on the part of Nature. This 
viewpoint applied to the above auction example considers a population of bidders associated for 
each player and, in each stage. Nature uniformly at random selects one bidder from each population 
to participate in the auction. We re-interpret and strengthen a result of Syrgkanis and Tardos Ha 
by showing that the smoothness property of the static game (for any fixed profile of bidder values) 
implies smoothness of this stochastic game. From the perspective of coarse correlated equilibrium, 
there is no difference between a stochastic game and the non-stochastic game with each random 
variable replaced with its expected value. Thus, the smoothness framework of Roughgarden HD 
extends this result to imply that the coarse correlated equilibria of the stochastic game are good. 
To show that we can expect good outcomes in Bayesian games, it suffices to show that no-regret 
learning converges to the coarse correlated equilibrium of this stochastic game. Importantly, when 
we consider learning algorithms there is a distinction between the stochastic game where players’ 
payoffs are random variables and the non-stochastic game where players’ payoffs are the expectation 


*In the standard terms of the game theory literature, we extend results for learning in games of complete 
information to games of incomplete information. 

^This result is a generalization of one of Foster and Vohra (7). 

^They referred to this price of anarchy for no-regret learners as the price of total anarchy. 
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of these variables. Our analysis addressed this distinction and, in particular, shows that, in the 
stochastic game on populations, no-regret learning converges almost surely to the set of coarse 
correlated equilibrium. This result implies that the average welfare of no-regret dynamics will be 
good, almost surely, and not only in expectation over the random draws of Nature. 

2 Preliminaries 

This section describes a general game theoretic environment which includes auctions and resource 
allocation mechanisms. For this general environment we review the results from the literature for 
analyzing the social welfare that arises from no-regret learning dynamics in repeated game play. 
The subsequent sections of the paper will generalize this model and these results to Bayesian games, 
a.k.a., games of incomplete information. 

General Game Form. A general game A4 is specified by a mapping from a profile a G A = 
Ai X • • • X An of allowable actions of players to an outcome. Behavior in a game may result in 
(possibly correlated) randomized actions a S A(,4)0 Player Ts utility in this game is determined 
by a profile of individual values v G V = Vi x ■■■ x Vn and the (implicit) outcome of the game; it 
is denoted I7i(a; Vi) = Ea~a [Ui{a; Vi)]. In games with a social planner or principal who does not 
take an action in the game, the utility of the principal is R{a) — Ea,..,a [-R(a)]- In many games of 
interest, such as auctions or allocation mechanisms, the utility of the principal is the revenue from 
payments from the players. We will use the term mechanism and game interchangeably. 

In a static game the payoffs of the players (given by v) are fixed. Subsequent sections will consider 
Bayesian games in the independent private value model, i.e., where player i’s value Vi is drawn 
independently from the other players’ values and is known only privately to player i. Classical 
game theory assumes complete information for static games, i.e., that v is known, and incomplete 
information in Bayesian games, i.e., that the distribution over V is known. For our study of learning 
in games no assumptions of knowledge are made; however, to connect to the classical literature 
we will use its terminology of complete and incomplete information to refer to static and Bayesian 
games, respectively. 

Social Welfare. We will be interested in analyzing the quality of the outcome of the game as 
defined by the social welfare, which is the sum of the utilities of the players and the principal. We 
will denote by SW (a; v) = '^i) + the expected social welfare of mechanism JA 

under a randomized action profile a. For any valuation profile u C V we will denote the optimal 
social welfare, i.e, the maximum over outcomes of the game of the sum of utilities, by Opt(u). 

No-regret Learning and Coarse Correlated Equilibria. For complete information games, i.e., 
fixed valuation profile v, Blum et al. El analyzed repeated play of players using no-regret learning 
algorithms, and showed that this play converges to a relaxation of Nash equilibrium, namely, coarse 
correlated equilibrium. 

Definition 1 (no regret). A player achieves no regret in a sequence of play a^,... , 0 ^ if his regret 
against any fixed strategy a' vanishes to zero: 

limr-s-oo “ U^{A\Vi)) = 0. (1) 

Definition 2 (coarse correlated equilibrium, CCE). A randomized action profile a G A(.4) is a 
coarse correlated equilibrium of a complete information game with valuation profile v if for every 
player i and a' G Ap 

Ea[C/i(a;ui)] > Ea [(7i(a', a.^; u*)] (2) 

Theorem 3 (Blum et al. 121). The empirical distribution of actions of any no-regret sequence in a 
repeated game converges to the set o/CCE of the static game. 

Price of Anarchy of CCE. Roughgarden ifTTl gave a unifying framework for comparing the social 
welfare, under various equilibrium notions including coarse correlated equilibrium, to the optimal 
social welfare by defining the notion of a smooth game. This framework was extended to games like 
auctions and allocation mechanisms by Syrgkanis and Tardos ina. 

"'Bold-face symbols denote random variables. 
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Game/Mechanism 


PoA 

Reference 

Simultaneous First Price Auction with Submodular Bidders 

(l-l/e,l) 

e 

e-1 

1121 

First Price Multi-Unit Auction 

(1-1/6,1) 

e 

e-1 

0 

First Price Position Auction 

(1/2,1) 

2 

El 

All-Pay Auction 

(1/2,1) 

2 

El 

Greedy Combinatorial Auction with d-complements 

(1 - l/e,d) 

de 

e-1 

Col 

Proportional Bandwitdth Allocation Mechanism 

(1/4,1) 

4 

El 

Submodular Welfare Games 

(1,1) 

2 

iBllllJ 

Congestion Games with Linear Delays 

(5/3,1/3) 

5/2 

El 


Figure 1; Examples of smooth games and mechanisms 


Definition 4 (smooth mechanism). A mechanism A4 is (A, p)-smooth/or some X, ^ > 0 there exists 
an independent randomized action profile a* (v) € A(aIi) x • • • x A(.4„)/or each valuation profile 
V, such that for any action profile a G A and valuation profile v G V: 

EiGN - p- R{a). (3) 

Many important games and mechanisms satisfy this smoothness definition for various parameters 
of A and p (see Figure [T]i; the following theorem shows that the welfare of any coarse correlated 
equilibrium in any of these games is nearly optimal. 

Theorem 5 (efficiency of CCE; II12I ). If a mechanism is {X, p)-smooth then the social welfare of 
any course correlated equilibrium at least of the optimal welfare, i.e., the price of anarchy 

satisfies PoA < 

Price of Anarchy of No-regret Learning. Following Blum et al. ||2l, Theorem|3]and Theorem|5] 
imply that no-regret learning dynamics have near-optimal social welfare. 

Corollary 6 (efficiency of no-regret dyhamics; ifT^ ). If a mechanism is (A, p)-smooth then the 
average welfare of any no-regret dynamics of the repeated game with a fixed player set and valuation 
profile, achieves average social welfare at least of the optimal welfare, i.e., the price of 

anarchy satisfies PoA < 

Importantly, Corollary |6] holds the valuation profile v GV fixed throughout the repeated game play. 
The main contribution of this paper is in extending this theory to games of incomplete information, 
e.g., where the values of the players are drawn at random in each round of game play. 

3 Population Interpretation of Bayesian Games 

In the standard independent private value model of a Bayesian game there are n players. Player i 
has type drawn uniformly from the set of type Vi (and this distribution is denoted We will 
restrict attention to the case when the type space Vi is finite. A player’s strategy in this Bayesian 
game is a mapping Si : Vi —?> Ai from a valuation Vi G Vi to an action Oi G Ai. We will denote 
with Si = A^’ the strategy space of each player and with E = Ei x ... x E„. In the game, each 
player i realizes his type Vi from the distribution and then makes action Si(vi) in the game. 

In the population interpretation of the Bayesian game, also called the agent normal form representa¬ 
tion 0, there are n finite populations of players. Each player in population i has a type Vi which we 
assume to be distinct for each player in each population and across populations^ The set of players 
in the population is denoted Vi. and the player in population i with type Vi is called player Vi. In the 
population game, each player Vi chooses an action Si{vi). Nature uniformly draws one player from 

^The restriction to the uniform distribution is without loss of generality for any finite type space and for any 
distribution over the type space that involves only rational probabilities. 

®The restriction to distinct types is without of loss of generality as we can always augment a type space with 
an index that does not affect player utilities. 
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each population, and the game is played with those players’ actions. In other words, the utility of 
player Vi from population i is: 

(s) = Ev (s (v); V,) • 1 {Vi = Ui}] (4) 

Notice that the population interpretation of the Bayesian game is in fact a stochastic game of com¬ 
plete information. 

There are multiple generalizations of coarse correlated equilibria from games of complete informa¬ 
tion to games of incomplete information (c.f. Ih), lH], B)). One of the canonical definitions is simply 
the coarse correlated equilibrium of the stochastic game of complete information that is defined by 
the population interpretation aboveQ 

Definition 7 (Bayesian coarse correlated equilibrium - Bayes-CCE). A randomized strategy profile 
s S A(I]) is a Bayesian coarse correlated equilibrium if for every a' S Ai and for every Vi € Vi." 

EsEv [(7i(s(v); Vi) | Vi = Vi] > E^Ev [(7i(a-, s_i(v_i); Vi) | Vi = v^] (5) 

In a game of incomplete information the welfare in equilibrium will be compared to the expected 
ex-post optimal social welfare Ev[Opt(v)]. We will refer to the worst-case ratio of the expected 
optimal social welfare over the expected social welfare of any Bayes-CCE as Bayes-CCE-PoA. 

4 Learning in Repeated Bayesian Game 

Consider a repeated version of the population interpretation of a Bayesian game. At each iteration 
one player Vi from each population is sampled uniformly and independently from other populations. 
The set of chosen players then participate in an instance of a mechanism Ai. We assume that each 
player Vi G Vi, uses some no-regret learning rule to play in this repeated game0 In Definition[8j we 
describe the structure of the game and our notation more elaborately. 

Definition 8. The repeated Bayesian game of Ai proceeds as follows. In stage t: 

1. Each player Vi G Vi in each population i picks an action s* *(ui) G Ai. We denote with 

si G the function that maps a player Vi G Vi to his action. 

2. From each population i one player vj G Vi is selected uniformly at random. Let = 
{vl,... ,vl^) be the chosen profile of players and s*{v*) = (s* (u‘),..., s^(u^)) be the 
profile of chosen actions. 

3. Each player z)* participates in an instance of game A4, in the role of player i G [n\, with 
action sl(vl) and experiences a utility of Ui{s* (v*)] vl). All players not selected in Step 2 
experience zero utility. 

Remark. We point out that for each player in a population to achieve no-regret he does not need 
to know the distribution of values in other populations. There exist algorithms that can achieve the 
no-regret property and simply require an oracle that returns the utility of a player at each iteration. 
Thus all we need to assume is that each player receives as feedback his utility at each iteration. ■ 


Remark. We also note that our results would extend to the case where at each period multiple 
matchings are sampled independently and players potentially participate in more than one instance 
of the mechanism Ad and potentially with different players from the remaining population. The only 
thing that the players need to observe in such a setting is their average utility that resulted from their 
action s\(vi) G Ai from all the instances that they participated at the given period. Such a scenario 
seems an appealing model in online ad auction marketplaces where players receive only average 
utility feedback from their bids. ■ 


^This notion is the coarse analog of the agent normal form Bayes correlated equilibrium defined in Section 
4.2 of Forges jfij. 

* An equivalent and standard way to view a Bayesian game is that each player draws his value independently 
from his distribution each time the game is played. In this interpretation the player plays by choosing a strategy 
that maps his value to an action (or distribution over actions). In this interpretation our no-regret condition 
requires that the player not regret his actions for each possible value. 
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Bayesian Price of Anarchy for No-regret Learners. In this repeated game setting we want to 
compare the average social welfare of any sequence of play where each player uses a vanishing 
regret algorithm versus the average optimal welfare. Moreover, we want to quantify the worst-case 
such average welfare over all possible valuation distributions within each population; 


sup 


lim sup 

T —^■OO 


ELiOpt(^*) 

EtEi («*(«*);■»*) 


(6) 


We will refer to this quantity as the Bayesian price of anarchy for no-regret learners. The numerator 
of this term is simply the average optimal welfare when players from each population are drawn 
independently in each stage; it converges almost surely to the expected ex-post optimal welfare 
Ev[Opt(v)] of the stage game. Our main theorem is that if the mechanism is smooth and players 
follow no-regret strategies then the expected welfare is guaranteed to be close to the optimal welfare. 
Theorem 9 (Main Theorem). If a mechanism is (A, p)-smooth then the average (over time) welfare 
of any no-regret dynamics of the repeated Bayesian game achieves average social welfare at least 
max{i n} average optimal welfare, i.e. PoA < almost surely. 


Roadmap of the proof. In Section |5] we show that any vanishing regret sequence of play of the 
repeated Bayesian game, will converge almost surely to the Bayesian version of a coarse correlated 
equilibrium of the incomplete information stage game. Therefore the Bayesian price of total anarchy 
will be upper bounded by the efficiency of guarantee of any Bayesian coarse correlated equilibrium. 
Finally, in Section|6]we show that the price of anarchy bound of smooth mechanisms directly extends 
to Bayesian coarse correlated equilibria, thereby providing an upper bound on the Bayesian price of 
total anarchy of the repeated game. 


Remark. We point out that our definition of Bayes-CCE is inherently different and more restricted 
than the one defined in Caragiannis et al. a. There, a Bayes-CCE is defined as a joint distribution 
D over V x A, such that if (v, a) ~ I? then for any Vi € Vi and a'fvi) G Ap. 

]E(v.a) [Ui{a.\ u,)] > [U^{a'fsri), a_i; u*)] (7) 

The main difference is that the product distribution defined by a distribution in A(S) and the dis¬ 
tribution of values, cannot produce any possible joint distribution over (V, A), but the type of joint 
distributions are restricted to satisfy a conditional independence property described by ||6]. Namely 
that player Ts action is conditionally independent of some other player j’s value, given player Ts 
type. Such a conditional independence property is essential for the guarantees that we will present 
in this work to extend to a Bayes-CCE and hence do not seem to extend to the notion given in ||4l- 
However, as we will show in Section |5l the no-regret dynamics that we analyze, which are math¬ 
ematically equivalent to the dynamics in a, do converge to this smaller set of Bayes-CCE that 
we define and for which our efficiency guarantees will extend. This extra convergence property is 
not needed when the mechanism satisfies the stronger semi-smoothness property defined in 0 and 
thereby was not needed to show efficiency bounds in their setting. ■ 


5 Convergence of Bayesian No-Regret to Bayes- CCE 

In this section we show that no-regret learning in the repeated Bayesian game converges almost 
surely to the set of Bayesian coarse correlated equilibria. Any given sequence of play of the repeated 
Bayesian game, which we defined in Definition!^ gives rise to a sequence of strategy-value pairs 
(s*, u*) where s‘ = (s*,..., s^) and s* G captures the actions that each player in population 
i would have chosen, had they been picked. Then observe that all that matters to compute the average 
social welfare of the game for any given time step T, is the empirical distribution of pairs (s, v), up 
till time step T, denoted as D^, i.e. if (s^, v^) is a random sample from 

T ELi SW{A{Ay, A) = [51E(s^(v^); v^)] (8) 

Lemma 10 (Almost sure convergence to Bayes-CCE). Consider a sequence of play of the random 
matching game, where each player uses a vanishing regret algorithm and let be the empirical 
distribution of (strategy, valuation) profile pairs up till time step T. Consider any subsequence of 
{D^}t that converges in distribution to some distribution D. Then, almost surely, D is a product 
distribution, i.e. D = Dg x with Dg G A(E) and Dy x A(V) such that Dy — T and 
Dg G Bayes-CCE of the static incomplete information game with distributional beliefs T. 
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Proof. We will denote with 


rt{a*,a;v^) = Ui{a*,a-i;Vi) - Ui{a;Vi), 

the regret of player Vi from population i, for action a* at action profile a. For avi €Vi let xl (vi) = 
l{vl = Vi}. Since the sequence has vanishing regret for each player Vi in population Pi, it must be 
that for any s* € 

■ri{s*{vi),s\v*);vi) < o{T) (9) 


For any fixed T, let Dj G A(I]) denote the empirical distribution of s‘ and let s be a random sample 
from DJ. For each s G S, let 7^ C [T] denote the time steps such that s* = s for each t G Ts. Then 
we can re-write Equation (|9|l as; 


Es 


Eter, • n (s* (vi), Vi) 


ItTT 


— T 


( 10 ) 


For any s G E and w G V, let Ts,w = {t gTs '■ v* = tu}. Then we can re-write Equation (fTOl) as; 


Es 


E 


ujSV ITsI 


l{wi = v^} -n {s*{vi),s{w);Vi) 


< 2(11 
— T 


( 11 ) 


Now we observe that ^ is the empirical frequency of the valuation vector w € V, when filtered 

at time steps where the strategy vector was s. Since at each time step t the valuation vector v* is 
picked independently from the distribution of valuation profiles P, this is the empirical frequency 
of Ts independent samples from P. 

By standard arguments from empirical processes theory, if 7^ —oo then this empirical distribution 
converges almost surely to the distribution P. On the other hand if 7^ doesn’t go to oo, then the 
empirical frequency of strategy s vanishes to 0 as T —oo and therefore has measure zero in the 
above expectation as T —oo. Thus for any convergent subsequence of {D^}, if D is the limit 
distribution, then if s is in the support of D, then almost surely the distribution of w conditional on 
strategy s is P. Thus we can write D as a product distribution Dg x P. 

Moreover, if we denote with w the random variable that follows distribution P, then the limit of 
Equation (fTTT i for any convergent sub-sequence, will give that; 

a.s.; Es.^d^Ew.^j:- [1{wj = Vi} ■ n (s ■ {vf, s(w); Vi)] < 0 

Equivalently, we get that Dg will satisfy that for all Vi G Vi and for all s*: 

a.s.; Es.^d^Ew.^j:- [n (s-(^0: s(w); w^) | = •Ui] < 0 

The latter is exactly the Bayes-CCE condition from Definition |2l Thus Dg is in the set of 
Bayes-CCE of the static incomplete incomplete information game among n players, where the 
type profile is drawn from P. ■ 


Given the latter convergence theorem we can easily conclude the following the following theorem, 
whose proof is given in the supplementary material. 

Theorem 11. The price of anarchy for Bayesian no-regret dynamics is upper bounded by the price 
of anarchy of Bayesian coarse correlated equilibria, almost surely. 


6 Efficiency of Smooth Mechanisms at Bayes Coarse Correlated Equilibria 

In this section we show that smoothness of a mechanism At implies that any Bayes-CCE of the 
incomplete information setting achieves at least ii} expected optimal welfare. To show 

this we will adopt the interpretation of Bayes-CCE that we used in the previous section, as coarse 
correlated equilibria of a more complex normal form game; the stochastic agent normal form rep¬ 
resentation of the Bayesian game. We can interpret this complex normal form game as the game 
that arises from a complete information mechanism among J2i 1^*1 players, which randomly 
samples one player from each of the n population and where the utility of a player in the complete 
information mechanism Ad*® is given by Equation (|4]i. The set of possible outcomes in this agent 
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game corresponds to the set of mappings from a profile of chosen players to an outcome in the un¬ 
derlying mechanism A4. The optimal welfare of this game, is then the expected ex-post optimal 
welfare Opt*° = Ev [Opt(v)]. 

The main theorem that we will show is that whenever mechanism A4 is (A, p)-smooth, then also 
mechanism Ad*® is (A, /r)-smooth. Then we will invoke a theorem of IfT^ fTTI . which shows that 
any coarse correlated equilibrium of a complete information mechanism achieves at least 
of the optimal welfare. By the equivalence between Bayes-CCE and CCE of this complete infor¬ 
mation game, we get that every Bayes-CCE of the Bayesian game achieves at least /^} 
expected optimal welfare. 

Theorem 12 (Erom complete information to Bayesian smoothness). If a mechanism Ai is (A, ^)- 
smooth, then for any vector of independent valuation distributions T = {fF\, ■ ■ ■, An), the complete 
information mechanism Ad*® is also (A, p,)-smooth. 


Proof Consider the following randomized deviation for each player Vi € Vi in population i: He 
random samples a valuation profile w ^ A". Then he plays according to the randomized action 
s*(ui,w_i), i.e., the player deviates using the randomized action guaranteed by the smoothness 
property of mechanism Ad for his type Vi and the random sample of the types of the others w_i. 


Consider an arbitrary action profile s = (si,..., s„) for all players in all populations. In this 

context it is better to think of each Si as a | Vi | dimensional vector in aI'^’ ' and to view s as a ^ J Vi | 
dimensional vector. Then with s-y^ we will denote all the components of this large vector except 
the ones corresponding to player Vi G Vi. Moreover, we will be denoting with v a sample from A 
drawn by mechanism Ad*®. We now argue about the expected utility of player Vi from this deviation, 
which is: 


Ew [C/*®. (Si(wi,w_i),s_„J] = EwEv [C/i(Si(t^i,w_i),s_i(v_i);Uj) • l{vi = Ui}] 

Summing the latter over all players Vi G Vi in population i: 

Ew [c/*®. (s-(^^i:W_i),s_^J] = Ew,v ^*(s3'(^»>’^-0A-*(v-i);Ui) • l{v, = uj] 

ViGVi 

= Ev.w [Ui{s* (vi, w_i), s_i(v_i); Vi)] 

= Ev.w [17i(s*(wi,w_i),s_i(v_i);wi)] 

= Ev.w [t^i(s*(w),s_i(v_i);wi)], 

where the second to last equation is an exchange of variable names and regrouping using indepen¬ 
dence. Summing overpopulations and using smoothness of Ad, we get smoothness of Ad*®: 

Y X!EiGH t^*(s*(w),s-i(v_,);w,) 

i^[n\ Vi^Vi 

> Ev,w [AOpt(w) — pA(s(v))] = AEw [Opt(w)] — p,R'^^{s) 


Corollary 13. Every Bayes-CCE of the incomplete information setting of a smooth mechanism 
Ad, achieves expected welfare at least of the expected optimal welfare. 

1 Finite Time Analysis and Convergence Rates 

In the previous section we argued about the limit average efficiency of the game as time goes to 
infinity. In this section we analyze the convergence rate to Bayes-CCE and we show approximate 
efficiency results even for finite time, when players are allowed to have some e-regret. 

Theorem 14. Consider the repeated matching game with a (A, p,)-smooth mechanism. Suppose that 
for any T > T^, each player in each of the n populations has regret at most A Then for every 5 
and p, there exists a T*{6, p), such that for any T > min{r'^, T*}, with probability 1 — p: 

T ELi SW{.s\A); A) > ss^Ev [OPT(v)] -5-p-e (12) 

Moreover, T*{d, p) < ^4-" log 
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Supplementary material for 
“No-Regret Learning in Bayesian Games” 

A Proof of Theorem [IT] 

For readability we repeat the definitions of Lemma [TOl and Theorem[TT]from the main text. 


Lemma ITOl Let D £ A(S x V) be a joint distribution of (strategy, valuation) profile pairs. Con¬ 
sider a sequence of play of the random matching game, where each player uses a vanishing regret 
algorithm and let be the empirical distribution of strategy, valuation profile pairs up till time 
step T. Suppose that there exists a subsequence o/{Z?^}t that converges in distribution to D. Then, 
almost surely, D is a product distribution, i.e. D = Dg x D^, with Dg G A(E) and Dy x A(V) such 
that Dy = and Dg £ Bayes-CCE of the static incomplete information game with distributional 
beliefs T. 


Theorem ITT] The price of anarchy for Bayesian no-regret dynamics is upper bounded by the price 
of anarchy of Bayesian coarse correlated equilibria. 


Proof. Let D £ A(E x V) be a joint distribution, such that there is a subsequence of 
converging in distribution to D. Then by Lemma [TOl almost surely, Z? is a product distribution, i.e. 
D £ A(S) X A(V) and that the marginal on V is equal to T and the marginal on E is a Bayes-CCE 
of the static incomplete information game with distributional beliefs F. 

Therefore, if p is the Bayes-CCE — PoA of the mechanism, and if (s, v) is a random sample from 
D, then almost surely: 

Es,v [S'VL(s(v); v)] > ^Ev [Opt(v)] (13) 

Thus the limit average social welfare of any convergent subsequence will be at least ^Ev [Opt(v)], 
which then implies that almost surely: 


T 


^^SW(s\v*);v^) > ^E^ [Opt(v)] = - lim ;^^Opt(i;‘) 

T—>-oo 1 ^' p p T-s-oo 1 ^' 


Thus for any non-measure zero event, for any e, there exists a /(e) such that for any T > /(e): 


if]^fL(s*(u*);t;*) 


> il^OPT(r;*)-e 


With no loss of generality we can assume that Ev [Opt(v)] > 0 (o.w. valuations are all zero 
and theorem holds trivially). Since, the average optimal welfare converges almost surely to 
Ev [Opt(v)], we get that for any non-measure zero event, there exists a g(6) such that for T > 9{S), 
^ X]t=i Opt{v*^) is bounded away from zero. Thereby, we can turn the additive error into a multi¬ 
plicative one, i.e. for any non-measure zero event and for any e' there exists w(e') such that for any 
T > w{e'): 


1 

T 


T T 

^ SW(s\v*)-, v^) > i (1 + e') i E OPT(^‘) 






This implies that almost surely: 




lim sup 


,T 


< p = Bayes-CCE-PoA 



B Proof of Theorem [l4l 


Theorem[l4l Consider the repeated matching game with a (A, p,)-smooth mechanism. Suppose that 
for any T > T°, each player in each of the n populations has regret at most Then for every 5 
and p, there exists a T*{6, p), such that for any T > min{T'^, T*}, with probability 1 — p: 

T 

i V SW{s\vy,v*) > -[OP'r(v)] -5-p-e (14) 

Moreover, T*{S,p) < 54-ra log ^4^ 

Proof Fix a population i and a Bayesian strategy s* € as well as a Bayesian strategy profile 
s S S. For shorter notation we will denote: 

TTi{s*,S,v) = Ui{s*{Vi),S-i{V-i)\Vi). 

For a time step T, let p^{s) = be the empirical distribution of a Bayesian strategy s and with 
p'^(v\s) = be the empirical distribution of values conditional on a Bayesian strategy s. The 
average utility of a population i up till time step T, when switching to a fixed Bayesian strategy s*, 
can be written as: 

1 ^ 

= ^p^(s)^p^(t;|s) ■7r,(s-,s,z;) (15) 

t—1 sGS vGV 

We will show that for any s*, there exists a T* {S, p) such that for any T >T* {S, p), with probability 

1 - p: 

> ^p'^(s)Ev [7r*(s-,s,v)] -5 (16) 

ses uev ses 

where v is a random variable drawn from the distribution of valuation profiles J-. We will denote 
with p{v) the density function implied by distribution T. 

In what follows we will denote with H = maXi^^n],vieVi,xieXi Vi(xi) the maximum possible value 
of any player. Thus observe that the utility of any player is upper bounded by H and that the revenue 
collected by any player at equilibrium is upper bounded by H. 

For a time period T, let G = {s G E : p'^{s) > C}- Then observe that: 

^P^{s)^{p^{v\s)-p{v)) ■Tr,(s*,s,v) > 
ses vev 

{p'^ iv\s) - p{v)) ■TT,{s*,s,v)-C - |S| H 

seG vev 

Observe that for any s G G, |7^| > C • T. Thus p'^{v\s) is the empirical mean of at least ( ■ T 
independent random samples of a Bernoulli trial with success probability p{v). Hence, by Hoeffding 
bounds, we have that |p^(u|s) —p(v)| < t with probability at least 1 — 2 exp (—2 ■ C ‘T ■ f^). Thus 
with that much probability we get: 

{p'^ {v\s) - p{v)) ■TT,{s*,s,v) 

seE vev 

By setting t = 2 -\v\-H ’ ^ = 2 -\i\-H P) = log we get the claimed 

property in Equation (fThl l. 

Now suppose that after time step T° each player in a population has regret e/n. Thus the average 
utility of the population is at least the utility from switching to any fixed Bayesian strategy s*, minus 
an error term of e/n: 

'^p'^{s)'^p'^{v\s)TT,{si,s,v) > ^P^(u|s)7r,(s-,s,u) - ^ (17) 

1?€V V^V 


2 








From the previous analysis, for any T > min{T°, p)}, we get that with probability 1 — p: 

'^P^{s)'^p^{v\s)TTi{si,s,v) > ^p^(s)Ev [7ri(s*,s,v)] - ^ (18) 

ses v€V seE 

Summing over all populations and using the Bayesian smoothness property of the mechanism from 
Theorem[T2] we have that with probability 1 — p: 

'^P^is)'^p'^{v\s)Y^TT,{si,s,v) > (AEv[Opt(v)] - pR^^^is)) - y -e 

sGE vGV i sGE 

> AEv [OPT(v)] -pY,p^{s)R^^{s) - ^ - e 

sGS 

To conclude the theorem we observe that since for any s € E, |p^(z;|s) — p{v)\ < 3 .n.|y|.ff , we get 
that; 

= '^p{v)R{s{v)) < '^p'^{v\s)R{s{v)) + ^ (19) 

v^V v^V 

Since, the revenue collected by a player at any action in the support of an equilibrium is at most H. 
By the latter we can combine the revenue on the right hand side with the utility on the left hand side. 
We can also bound the remaining (p — 1) of the revenue, by (p — 1) of the average welfare minus 
e, since each player in each population can always drop out of the auction and therefore his average 
utility at an ^-regret sequence must be at least — 

Hence, we get that: 

'^P^{s)'^P^{v\s)SW{s{v);v) > - ^ —^Ev[Opt(v)] - 6- p-e (20) 

sge «gv maxj ,pj 

Thus choosing T* (p, ^ log , we get the conditions of the theorem. ■ 
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